Robust retrieval models for false positive errors in spoken documents
نویسندگان
چکیده
How to deal with speech recognition errors and out-ofvocabulary (OOV) words, which are referred to as false negative errors, are common challenges in spoken document processing. To deal with them in spoken content retrieval (SCR), the SCR method that incorporated spoken term detection (STD) as the pre-process stage (referred to as STD-SCR) has been proposed. However, the STD-SCR tends to increase false positive errors in compensation for reducing false negative errors. In this work, we propose robust retrieval models for false positive errors by using word co-occurrences. The words that co-occur in a given query are semantically related, so that they are likely to co-occur also in the document to be retrieved. On the other hand, if a word in a given query appears alone in a document, it is more like a false positive. We incorporate this idea into two retrieval models commonly used in the literature, i.e. the vector space model and the query likelihood model. Our experimental result showed our proposed extensions on the retrieval models successfully improved the retrieval performance not only for the STD-SCR but also for the conventional SCR method.
منابع مشابه
Graph-based Document Expansion and Robust SCR Models for False Positives: Experiments at the NTCIR-12 SpokenQuery&Doc-2
In this paper, we report our experiments at NTCIR-12 Spoken Query&Doc-2 task. We participated spoken query driven spoken content retrieval (SQ-SCR) subtasks of Spoken Query&Doc2. We submited two types of results, which are conventional spoken content retrieval method (referred to as C-SCR) and STD based approach for SCR (referred to as STD-SCR). The latter was proposed in order to deal with spe...
متن کاملEfficient Interactive Retrieval of Spo Ranked by Reinforcem
Unlike written documents, spoken documents are difficult to display on the screen; it is also difficult for users to browse these documents during retrieval. It has been proposed recently to use interactive multi-modal dialogues to help the user navigate through a spoken document archive to retrieve the desired documents. This interaction is based on a topic hierarchy constructed by the key ter...
متن کاملTowards robust methods for spoken document retrieval
In this paper, we investigate a number of robust indexing and retrieval methods in an effort to improve spoken document retrieval performance in the presence of speech recognition errors. In particular, we examine expanding the original query representation to include confusible terms; developing a new document-query retrieval measure based on approximate matching that is less sensitive to reco...
متن کاملEffects of Query Expansion for Spoken Document Passage Retrieval
One of the major challenges for spoken document retrieval is how to handle speech recognition errors within the target documents. Query expansion is promising for this challenge. In this paper, we apply relevance models, a type of query expansion method, for the spoken document passage retrieval task. We adapted the original relevance model for passage retrieval. We also extended it to benefit ...
متن کاملRobust Techniques for Organizing and Retrieving Spoken Documents
Information retrieval tasks such as document retrieval and topic detection and tracking (TDT) show little degradation when applied to speech recognizer output. We claim that the robustness of the process is because of inherent redundancy in the problem: not only are words repeated, but semantically related words also provide support. We show how document and query expansion can enhance that red...
متن کامل